Some theoretical open problems in data mining

نویسنده

  • Anders Schack-Nielsen
چکیده

This document contains a selection of open problems provided by participants of the data mining PhD reading seminar at the IT University of Copenhagen in fall 2009. It is not intended to showcase the “most significant” open problems in the field, but merely problems found interesting by the participants. In particular, the theoretical angle of the seminar is reflected. 1 Frequent itemsets — a relaxed version Suggested by Rasmus Pagh. This is an attempt to formulate an “easiest possible” version of the frequent itemsets problem that would have interesting (theoretical) implications for association mining. Given a collection of sets S1, . . . , Sn ⊆ U , define sup(S) = |{i | S ⊆ Si}|. For parameters ε > 0, k,∆, t ∈ N, where it is promised that |{S ∈ ( U k ) | sup(S) ≥ (1 + ε)∆}| ≥ t the task is to output t sets T1, . . . , Tt ∈ ( U k ) where sup(Ti) ≥ ∆. The following may be assumed: • Parameter restrictions: |U | = O(n2); ∆ = O(log n) for constant ε. • k and 1/ε are a small numbers, so exponential dependence is acceptable. • It is acceptable that the output satisfies the guarantee only with probability 1/2. The problem is to come up with algorithms and/or hardness results for the above. We note that hardness results for related problems were shown in [11]. 2 Deterministic frequent items [2, Q4] Suggested by Anders Schack-Nielsen. Given a stream of insertions and deletions of items (the set of items is {1, 2, . . . , n}) we wish to estimate the frequencies of each item within an -factor of the L1-norm of the frequencies (m). A space lower bound for this problem is Ω( −1 log(m) log( n)) [10]. The Count-Min algorithm is a randomized O( −1 log(m) log(δ−1))-space algorithm that succeeds with probability 1− δ [6]. In [10] a deterministic algorithm is given that uses space O( −2 log −1(n) log(log −1(n)) logm).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Bishop’s and Janbu’s Models Capabilities on Slope Stability Problems with Special Consideration to Open-Pit Mining Operations

One of the most effective parameters in economics of open-pit mines is the pit slope angle, so that the slope angle more than the optimum value increases the probability of a large failure in the pit wall and the slope angle less than the optimum value leads to increasing stripping ratio and reducing net present value of mine. Therefore, in this paper, considering the limit equilibrium methods ...

متن کامل

An Uncertainty-based Transition from Open Pit to Underground Mining

There are some large scale orebodies that extend from surface to the extreme depths of the ground. Such orebodies should be extracted by a combination of surface and underground mining methods. Economically, it is highly important to know the limit of upper and lower mining activities. This concern leads the mine designers to the transition problem, which is one of the most complicated problems...

متن کامل

A New Cost Model for Estimation of Open Pit Copper Mine Capital Expenditure

One of the most important issues in all stages of mining study is capital cost estimation. Determination of capital expenditure is a challenging issue for mine designers. In recent decade, quite a few number of studies have focused on proposing estimation models to predict mining capital cost. However, these efforts have not achieved to a predictor model with reliable range of error. Both of ov...

متن کامل

The iZi Project: Easy Prototyping of Interesting Pattern Mining Algorithms

In the last decade, many data mining tools have been developed. They address most of the classical data mining problems such as classification, clustering or pattern mining. However, providing classical solutions for classical problems is not always sufficient. This is especially true for pattern mining problems known to be “representable as set”, an important class of problems which have many ...

متن کامل

ORE extraction and blending optimization model in poly- metallic open PIT mines by chance constrained one-sided goal programming

Determination a sequence of extracting ore is one of the most important problems in mine annual production scheduling. Production scheduling affects mining performance especially in a poly-metallic open pit mine with considering the imposed operational and physical constraints mandated by high levels of reliability in relation to the obtained actual results. One of the important operational con...

متن کامل

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009